Skip to content

BUG: Add warning if rows have more columns than expected #33782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 39 commits into from

Conversation

mproszewska
Copy link
Contributor

@mproszewska
Copy link
Contributor Author

Multiple tests are not passing due to the new warning.
Before fixing this, I want to ask if this solution is okay?

@gfyoung gfyoung added API Design IO CSV read_csv, to_csv labels May 3, 2020
@@ -2508,6 +2512,13 @@ def read(self, rows=None):
content = content[1:]

alldata = self._rows_to_cols(content)
if len(columns) != len(alldata) and notna(alldata[len(columns) :]).any():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need the notna check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In example mentioned in linked issue additional comma was added in one row. I assumed that additional commas are common and hence we might ignore them and don't raise a warning.
I'm using notna to check if data that won't be included contains only NaN values.

@gfyoung
Copy link
Member

gfyoung commented May 3, 2020

@mproszewska : Sorry for the long wait here! Overall, the solution is on the right track.

@pep8speaks
Copy link

pep8speaks commented May 10, 2020

Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-08 19:27:23 UTC

@mproszewska
Copy link
Contributor Author

I have no idea how to check where this ipython directive error comes from.

@jbrockmendel
Copy link
Member

can you rebase and ill take a look at the ipython thing

@mproszewska
Copy link
Contributor Author

mproszewska commented Oct 8, 2020

can you rebase and ill take a look at the ipython thing

I rebased it

@jbrockmendel
Copy link
Member

on the docbuild, it looks like the following is issuing a warning

data = "a,b,c\n4,apple,bat,\n8,orange,cow,"
pd.read_csv(StringIO(data), index_col=False)

under this PR, is issuing a warning here the correct thing to do? If so, then an :okwarning: needs to go in io.rst on L756

@mproszewska
Copy link
Contributor Author

on the docbuild, it looks like the following is issuing a warning

data = "a,b,c\n4,apple,bat,\n8,orange,cow,"
pd.read_csv(StringIO(data), index_col=False)

under this PR, is issuing a warning here the correct thing to do? If so, then an :okwarning: needs to go in io.rst on L756

I think so. First row has 3 values and the rest - 4. Where in in.rst should :okwarning: be added? maybe there's another way to do that. It shouldn't be a common warning.

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

conceptually this is ok. pls merge master and will re-look (and yes we would have to either fix the warnings or assert_produces_warning, though prob should fix the incorrect usages).

@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

hmm this looks like overlapping with #38587

@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

closing in favor of #38587

@jreback jreback closed this Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behavior of read_csv when given an additional value on the first row of CSV file
5 participants